Creating Similarity
نویسندگان
چکیده
Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the narrow vertical view offered by WordNet, and support a system for creative idea exploration called Thesaurus Rex. We also show how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexicosemantic similarity (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to comparisons that are creative or insightful, yet it is sufficient for the many mundane comparisons that one tacitly performs in daily life, such as when we organize our books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison spurs an audience to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we perceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to reimagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a WordNet-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-and-vertical approach is evaluated in section 5, on the Miller & Charles (1991) dataset. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of noun-senses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instance-level ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in the taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) thus use the depth of a lexical concept in the WordNet hierarchy as a proxy for its information content, and estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied by Leacock and Chodorow’s hierarchical use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of information shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate information content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to the average similarity ratings offered by humans for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a high-dimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition (SVD). In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspective as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [...] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon and butcher, flesh and meat, or scalpel and cleaver. Comparisons thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises whenever it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits finegrained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into a lasting and explicitly categorical WordNet form. 3 Divergent Forms of (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web n-grams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as "swaggering like a cowboy” or “as grizzled as a cowboy”. So for each property P suggested by Google n-grams for a lexical concept C, we generate a like-simile for verbal behaviors such as swaggering and an asas-simile for adjectives such as lonesome. Each is then dispatched to Google as a phrasal query. We value quality over size, as these similes will later be used to find diverse viewpoints on the web via bootstrapping. We thus manually filter each web simile, to weed out any that are ill-formed, and those intended to be seen as ironic by their authors. This gives us a body of 12,000+ valid web similes. Veale (2011, 2012, 2013) notes that web uses of the pattern “as P as C” are rife with irony. In contrast, web instances of “P S such as C” – where S denotes a superordinate of C – are rarely ironic. Hao & Veale (2010) exploit this fact to filter ironic comparisons from web similes, by re-expressing each “as P as C” simile as “P * such as C” (using a wildcard * to match any values for S) and looking for attested uses of this new form on the web. Since each hit will also yield a value for S via the wildcard *, and a fine-grained category P-S for C, we use this approach here to harvest fine-grained categories from the web from most of our similes. Once C is seen to be an exemplary member of the category P-S, such as cola in fizzy-drink, a targeted web search is used to find other members of P-S, via the anchored query “P S such as * and C”. For example, “fizzy drinks such as * and cola” will retrieve web texts in which * is matched to soda or lemonade. Each new member can then be used to instantiate a further query, as in “fizzy drinks such as * and soda”, to retrieve other members of P-S, such as champagne and root beer. This bootstrapping process runs in successive cycles, using doubly-anchored patterns that – following Kozareva et al. (2008) and Veale et al. (2009) – explicitly mention both the category to be populated (P-S) and a recently acquired member of this category (C). As cautioned by Kozareva et al., it is reckless to bootstrap from members to categories to members again if each enfilade of queries is likely to return noisy results. A reliable filter must be applied at each stage, to ensure that any member C that is placed in a category P-S is a sensible member of the category S. Only by filtering in this way can we stop the rapid accumulation of noise. For instance, a WordNet-based filter can discard any categorization statement “P S such as X and C” where X does not denote a WordNet entry for which S does not denote a valid hypernym. Such a filter offers no creative latitude, however, since it forces every pairing of C and P-S to precisely obey WordNet’s category hierarchy. We use instead the near-miss filter described in Veale et al. (2009), in which X must denote a descendant of some direct hypernym of some sense of S. The filter does not (and cannot) determine whether P is salient for X. It merely assumes that if P is salient for C, it is salient for X. Figure 1. Fine-grained perspectives for cola found by Thesaurus Rex on the web. See also Figures 3 and 4. Five successive cycles of bootstrapping are performed, using the 12,000+ web similes as a starting point. Consider cola: after 1 cycle, we acquire 14 new categories, such as effervescent-beverage and sweet-beverage. After 2 cycles we acquire 43 categories; after 3 cycles, 72; after 4 cycles, 93; and after 5 cycles, we acquire 102 fine-grained perspectives on cola, such as stimulating-drink and corrosive-substance. These alternative viewpoints, for a broad array of concepts, are gleaned from the collective intelligence of the web. Some are more discerning and informative than others – see for instance war & divorce in Figure 1 – though as de Bono (1971) notes, lateral thinking does not privilege a narrow set of “correct” viewpoints, rather it generates a broad array of interesting alternatives, none of which are ever “wrong”, even if some prove more useful than others in a given context. 4 Measuring and Creating Similarity Which perspectives will be most useful and informative to a WordNet-based similarity metric? Simply, a perspective M-Cx for a concept Cy can be coherently added to WordNet iff Cx denotes a hypernym of some sense of Cy in WordNet. For purposes of quantifying the similarity of two terms t1 and t2 – by finding the WordNet senses of these terms that exhibit the highest similarity – we can augment WordNet with the perspectives on t1 and t2 that are coherent with WordNet’s hierarchy. So for t1=cola & t2=acid, corrosive-substance offers a coherent new perspective on each, slotting in beneath the matching WordNet sense of substance. A category system is a structured feature space. We estimate the similarity of C1 and C2 in WordNet as the cosine of the angle between the richest feature vectors we construct for each. The dimensions of these vectors are the atomic hypernyms (direct or indirect) of C1 and C2. The value of a dimension H in a feature vector is the information content (IC) of the corresponding hypernym H:
منابع مشابه
Determining specific species and the species contribution in the similarity between soil seed bank and standing vegetation Case study: Lazour rangeland- Firouzkooh
Determining the potential of soil seed bank and its specific species is important for conservation goals and vegetation restoration of rangelands. In this study, the characteristics of soil seed bank and standing vegetation in Lazour mountain rangeland were investigated in order to estimate the rehabilitation ability of the study area in case of possible disturbances. In order to determine the ...
متن کاملFast Hausdorff Trajectory Similarity on Spatial Networks using Virtual Nodes
Given a set of trajectories on a spatial network, the goal of the Network Hausdorff Distance Trajectory Similarity Matrix (NHDTSM) problem is to quickly calculate the commonly used network Hausdorff distance between all pairs of input trajectories. This problem is important to a variety of domains using trajectories, such as transportation services interested in finding primary corridors for pu...
متن کاملLearning Similarity Functions for Event Identification using Support Vector Machines
Every clustering algorithm requires a similarity measure, ideally optimized for the task in question. In this paper we are concerned with the task of identifying events in social media data and address the question of how a suitable similarity function can be learned from training data for this task. The task consists essentially in grouping social media documents by the event they belong to. I...
متن کاملOn Creating Reference Templates for Speaker Independent Recognition of Isolated Words
The three aspects of a statistical approach to a pattern recognition problem are the selection of features, choice of a measure of similarity, and a method for creating the reference templates (patterns) used in the statistical tests. This paper discusses a philosophy for creating reference templates for a speaker independent, isolated word recognition system. Although there remain many unanswe...
متن کاملThe Basic Principles of Metric Indexing
This chapter describes several methods of similarity search, based on metric indexing, in terms of their common, underlying principles. Several approaches to creating lower bounds using the metric axioms are discussed, such as pivoting and compact partitioning with metric ball regions and generalized hyperplanes. Finally, pointers are given for further exploration of the subject, including non-...
متن کاملThe Effects of Speech Rate Similarity on Compliance: Application of Communication Accommodation Theory
This experiment tested a communication accommodation theory (CAT) explanation for the effects of speaker speech rate on compliance with a request for help. It was predicted that communicators' speech rate similarity increases social attractiveness and creates relational obligations to comply. Nine speech rates were presented to assess preferences for speech rate and speech rate similarity. Four...
متن کامل